A Comparison of Approximate String Matching Algorithms
نویسندگان
چکیده
Experimental comparison of the running time of approximate string matching algorithms for the k differences problem is presented. Given a pattern string, a text string, and integer k, the task is to find all approximate occurrences of the pattern in the text with at most k differences (insertions, deletions, changes). We consider seven algorithms based on different approaches including dynamic programming, Boyer-Moore string matching, suffix automata, and the distribution of characters. It turns out that none of the algorithms is the best for all values of the problem parameters, and the speed differences between the methods can be considerable.
منابع مشابه
Approximate string matching as an algebraic computation
Approximate string matching has a long history and employs a wide variety of methods (see e.g. the survey [2]). We consider a variant of approximate matching that compares a fixed pattern string to every substring in the text string by a rational-weighted edit distance (e.g. the indel distance, defined as the number of character insertions and deletions, or the indelsub/Levenshtein distance, wh...
متن کاملAn Empirical Comparison of Approaches to Approximate String Matching in Private Record Linkage
Due to the frequency of spelling and typographical errors in practical applications, record linkage algorithms have to use string similarity functions. In many legal contexts, identifiers such as names have to be encrypted before a record linkage can be attempted. Therefore, algorithms for computing string similarity functions with encrypted identifiers are essential for approximating string ma...
متن کاملData structures and algorithms for approximate string matching
This paper surveys techniques for designing efficient sequential and parallel approximate string matching algorithms. Special attention is given to the methods for the construction of data structures that efficiently support primitive operations needed in approximate string matching.
متن کاملA Unified View to String Matching Algorithms
We present a uniied view to sequential algorithms for many pattern matching problems, using a nite automaton built from the pattern which uses the text as input. We show the limitations of deterministic nite automata (DFA) and the advantages of using a bitwise simulation of non-deterministic nite automata (NFA). This approach gives very fast practical algorithms which have good complexity for s...
متن کاملAverage-Optimal Multiple Approximate String Matching
We present a new algorithm for multiple approximate string matching, based on an extension of the optimal (on average) singlepattern approximate string matching algorithm of Chang and Marr. Our algorithm inherits the optimality and is also competitive in practice. We present a second algorithm that is linear time and handles higher difference ratios. We show experimentally that our algorithms a...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Softw., Pract. Exper.
دوره 26 شماره
صفحات -
تاریخ انتشار 1996